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Abstract 

We review the continuity and differentiability properties of the stress loss function 
in MDS and the local and global convergence properties of the SMACOF algorithm. 
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1 Introduction 

This book is about the least squares loss function 

°(X) = \ £ <"«(*«-<*wP0) 2 - (1) 

Z 1 <i<j<n 

define on M nxp , the space of all n x p matrices. We follow Kruskal (1964) and call cr(X) the 
stress of configuration X. Minimizing stress over p-dimensional configurations is the pMDS 
problem. 

In (1) the matrices of weights W = {■?%•} and dissimilarities A = {(%} are symmetric, non¬ 
negative, and hollow (zero diagonal). The matrix-valued function D(X) = {djj(X)} contains 
Euclidean distances between the rows of the configuration A", which are the coordinates of n 
points in MP. Thus D(X) is also symmetric, non-negative, and hollow. 

Two important special cases of pMDS are Unidimensional Scaling, which is 1MDS, and 
Full-dimensional Scaling, which is nMDS. Because of their importance, they also have their 
very own acronyms UDS and FDS, and they have their own chapters in this book. 

In pMDS we minimize stress. This means that we are trying to find the global minimum over 
p-dimensional configurations. In practice, however, our algorithms find local minima, which 
may or may not be global. In this paper we will summarize what we know about global and 
local minima, and what we know about local maxima and saddle points. 

2 Notation 

First some convenient notation, first introduced in De Leeuw (1977). Vector e* has n elements, 
with element i equal to +1, and all other elements zero. A VJ is the matrix (e* — efi){ei — efi)', 
which means elements ( i,i ) and (j,j) are equal to +1, while {i, j) and (j,i) are —1. Thus 

d%{X) = (e-i - efi'XX'ie, - efi = tr X'A^X = tr A tj C, (2) 


with C = XX'. 


We also define 

V = ^2 w ijAij, 

and the matrix-valued function B(m) with 


B{X) 


E w v 

dij(X)> 0 


$ij 

dij(X) 


A ■ • 


( 3 ) 

( 4 ) 


and B(X) = 0 if X = 0. If we assume, without loss of generality, that 


r, E w ij ^ij ~ 3) 

1 
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then 


(5) 


a(X) = 1 - tr X'B(X)X + *tr X'VX. 

We also suppose, without loss of generality, that W is irreducible , so that the pMDS problem 
does not separate into a number of smaller pMDS problems. For symmetric matrices 
irreducibity means that we cannot find a permutation matrix II such that Yi'WYi is the direct 
sum of a number of smaller matrices. 

V is symmetric with non-positive off-diagonal elements. It is doubly-centered (rows and 
columns add up to zero) and thus weakly diagonally dominant. It follows that it is positive 
semi-definite (see Varga (1962), section 1.5). Because of irreducibility it has rank n — 1, and 
the vectors in its null space are all proportional to e, the vector with all elements equal to 
+1. The matrix B(X) is also symmetric, positive semi-definite, and doubly-centered for each 
X. It may not be irreducible, because for example R(0) = 0. 


3 Differentiability 


Obviously 


Vif(X) = 2VX 


Additionally p(«) is differentiable at X if and only if dij(X) > 0 for all i < j with WijSij > 0. 
In that case 


Vp(X) = B(X)X. 


Result 13: <t(«) is differentiable at at X if and only if dij(X) > 0 for all i < j with 

WijSij > 0. In that case 

Va{X) = (V - B{X))X. 

Result 14: If (x(«) is differentiable at the stationary point X then (V — B(X))X = 0, or, 
in fixed point form, X = V + B(X)X. 


4 DC Functions 

Following De Leeuw (1977) we also define 


p(X) = 

E WijSijdij (X) = tr X'B {X)X, 

(6) 




v\x) = 

E w ij d? ij (X)= tv X'VX. 

(7) 




Result 1: [DC] 

1. p(«) is convex, homogeneous of degree one, non-negative, and continuous. 

2. p 2 (#) is convex, homogeneous of degree two, non-negative, quadratic, and continuous. 

3. <r(«) is a difference of convex functions (a.k.a. a DC function or a delta-convex function). 
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See Hirriart-Urruty (1988) or Vesely and Zajicek (1989) for a general discussion of DC 
functions. 

From the literature we know cr(«) is locally Lipschitz, has directional derivatives and nonempty 
subdifferentials everywhere, and is twice differentiable almost everywhere. In fact. cr(») is 
infinitely many times differentiable at all X that have dij{X) > 0 for all i < j with > 0. 


5 Homogeneity 

Result 2: [Bounded] At a local minimum point X of a(») we have 77 (A) < 1. 

Proof: Homogeneity gives <r( XX) = 1 — A p(X) +| A 2 rj 2 (X). If X is a local minimum then the 
minimum over A is attained for A = 1, i.e. we must have p(X) = r] 2 (X). By Cauchy-Schwartz 
p(X) < r](X) for all A", and thus at a local minimum r/ 2 (X) < rj(X), i.e. 77 (A) < 1. ■ 

Result 3: [Local Maxima] X = 0 is the unique local maximum point of er(«) and <r(0) = 1 
is the unique local maximum. 

Proof: This is because cr(AA) = 1 — Xp(X) + |A 2 // 2 (A") is a convex quadratic in A. The 
only maximum on the ray through X occurs at the boundary A = 0. ■ 

Result 4: [Unbounded] <r(#) is unbounded and consequently has no global maximum. 

Proof: ■ 

Note that the unboundedness of stress also follows from the proof of result 3. 

6 Picture 

Here is a picture of stress on a two-dimensional subspace which illustrates both result 3 and 
result 15. 

We first make a global perspective plot, over the range (—2.5, +2.5). 
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stress 



theta 1 


Figure 1: Picture of Stress 

Note the sharp ridge going northwest-southeast in the plot, indicating a ray of configurations 
where one of more of the distances are zero, and where consequently <?(•) is not differentiable. 

We first make a global perspective plot, over the range (—2.5, +2.5). 
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Figure 1: Picture of Stress 

We first make a global perspective plot, over the range (—2.5, +2.5). 
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Figure 1: Picture of Stress 


7 One-sided Directional Derivatives 


For a function / : M. nxp 4lwe define the one-sided (Dini) directional derivative at A" in 
direction Y as 


df(X; Y) — lim 

e4,(J 


/(X + eY) - f(X) 


and the one-sided (Peano) second-order directional derivative at A" in direction Y as 


d 2 f(X-Y) 


f(X + eY)-f(X)-eVf(X-,Y) 

l 1 !?-o- 


We know that cr(«) has one-sided directional derivatives of orders one and two. These can be 
easily computed by expanding the function at X in direction Y. 
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7.1 Expansion 

We start by expanding squared distances and distances. First 


d%(X + eY) = d 2 l3 {X) + 2e tr Y'A tj X + e 2 tr Y'A {j Y. 


Then for djj(X) > 0 


dij(X + eY) 


dij (A) T e 


tr Y'AijX 
dij{X) 


+ 



1 

dij(X) 



(tr Y'AjjX) 2 
tr X'AijX 


( 8 ) 

o(e 2 ), (9) 


and for dij(X) = 0 


d ij (X + eY) = ed ij {Y). (10) 

Equations (8), (9), and (10) combine in a straightforward way to an expansion of er(«) at X 
in direction Y. 

Result 5: [Quadratic] 


a(X + eY) = a(X) + e \ tr Y\V - B(X))X - E w^d^Y) 

dij(X )=0 


-e 2 hr Y\V - B{X))Y + E 


w. 


s ‘> (t r yM w A7 l+o(f2) 


^>0 tr X'AijX 


There are some interesting special cases of this result. 

Result 6: [Antisymmetric] Suppose Y = XT, with T antisymmetric, so that X + eY' = 
X(I + eT). Then 

*{X + eY) = a(X) - e £ w^d^Y) + ^e 2 tr Y\V - B{X))Y + o(e 2 ) (12) 

dij(X )=0 1 

Result 7: [Singular] Suppose X_ = [X \ 0] and X — [0 ] Y] so that X_ + eY_ = [X \ eY]. 
Here X_ and Y are n x p, X is n x r, with r < p, and Y is n x (p — r ). Then 

<t(X + eY) = a(X) - e E WiAA^Y) + 1 e 2 tr Y\V - B(X))Y + o(e 2 ) (13) 

dij(X)=0 2 

Result 7: [Singular] Suppose X = [X \ 0] and X — [Z \ Y] so that X + eH = [X YeZ \ eY], 
Here X_ and V are n x p, X is n x r, with r < p, and Y is n x (p — r). Then 

a(X + eY) = a(X)-e E *%Mp(E + ^e 2 tr Y\V - 

dij(X )=0 1 


B(X))Y + o(e 2 ) (14) 



7.2 Derivatives 


The expansion in result 5 immediately gives the first-order and second-order directional 
derivatives. 


Result 8: [First Directional Derivatives] 

da(X ; Y) = tr X'{V - B{X))Y - E w^d^Y). 

dij (X )=0 

Result 9: [Second Directional Derivatives] 

d 2 a(X; Y) = tr Y'{V - B{X))Y + ^ 


Sij (tr Y'AijX) 2 
toX'AvX ' 


(15) 


(16) 


The expression for the second order derivatives can be made a bit more matrix and computer 
friendly by rewriting it differently. Define y = vec(T) and x = vec(X). Thus x and y have 
length rip. Also define ~Xij = I p ® Aij, with I p the identity matrix of order p and & the 
Kronecker product. Thus the rip x rip matrix ~/t l3 is the direct sum of p copies of our previous 
n x n matrix . In the same way we write \^ for I p ® V and t(X) for I p 0 B(X). 


With this new notation 

<e<r(X-,Y)=tf 


'\f-t(x)+ Y. 

{ dipxp 


w, 


AijXx'~X 


o 


, xj (X) >0 dij(X) x r~A 

Define, for some additional shorthand, the rip x rip matrix 






x 


G(X)= E 


w. 


6^ AijXx'~X 


o 


d i:j (X)> o 


dij{X ) x'AijX 




as 


well as H(X) = l5(X) - G(X). Note that 


H(X)= E 


w 


dij 




dij(X)> o 


da(X) 




~XijXx'~Xij 

rf! A , . rr» 
i Aj /i ^J 


y- 


(17) 


and consequently both G(X) and H(X) are positive semi-definite, with H{X ) singular 
because H(X)x = 0. This implies 


tr Y\V - B(X))Y < d 2 a(X ; Y) < tr Y'VY. 


Also note that if dij(X) > 0 for all i < j then <r(») is two times (Frechet) differentiable at X, 
with 

Vcr(X) = (V — B{X))X. (18) 

We have to be a bit careful with the formula for the second (Frechet) derivative, which is a 
map for which there is no straightforward matrix expression. Its value at Y is 

V 2 a(X)(Y) = d 2 a(X ; Y) = y'tf - H(X))y, (19) 

with y = vec(y) as usual. 


Note that result 6 imply that 
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8 Necessary Conditions 

The formulas for the first and second order one-sided directional derivatives can be used to 
give necessary conditions for a local minimum (see, for example, Bednarik and Pastor (2008) 
or Ivanov (2016)). 

Result 10: [Local Minimum 1] If X is a local minimum point of cr(#) then 

1 . djj(X) > 0 for all i < j with WijSij > 0, 

2. (V - B{X))X = 0. 

Proof: This follows from 8. Suppose (V — B(X))X ^ 0. Then we can find Y such that 
tr Y'(V — B(X))X < 0, and da(X;Y) < 0. Thus X is not a local minimum point. If 
(V — B(X))X = 0 and there is an i < j such that > 0 and dij(X) = 0 then again we 

can find Y such that da(X', Y) < 0, and thus ATs not a local minimum point. ■. 

This result was proved for the first time in De Leeuw (1984). Note that it implies that if 
> 0 for all i < j then <r(«) is two times (Frechct) differentiable in a neighborhood of 
each local minimum X, because all d tl (X) must be non-zero. But note there are situations 
where the result does not apply. In multidimensional unfolding there are row-points and 
column-points. Weights between two row-points and between two-column points are zero, 
and thus at local minima distances between row points and between column points can be 
zero, and stress is not differentiable. Also in the case of perfect fit, if Sij = dij(Z) for some Z , 
and one of more of the dij(Z) are zero, we have cr(Z) = 0 and thus some of the distances for 
the global minimum, which is clearly a local minimum, are zero. 

Result 11: [Local Minimum 2] If X is a local minimum point of cr(«) then — H(X) is 

positive semi-definite. 

Result 12: [Local Minimum 3] If [X | 0] is a local minimum point of a(«) then 

1 . dij(X) > 0 for all i < j with w^S^j > 0. 

2 . (V - B{X))X = 0. 

3. V — B(X) is positive semi-definite. 

Result 16: If X is a stationary point of cr(«) then r](X) < 1. 

Proof: At a local minimum (V — B(X))X = 0 which implies p(X) = r] 2 (X). By Cauchy- 
Schwartz p(X) < rj(X) and thus p 2 (X) < r)(X ), which implies r)(X) < 1. ■ 

9 Full-dimensional Scaling 

10 Unidimensional Scaling 

11 Subdifferentials 

The sub differential of a convex function /(•) at a point x is the set d f(x) defined by 

9f{x) = {y\f(z)>f{x)+y'(z-x) V^} 
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If / is differentiable at x the sub differential is a singleton, and df(x) = {Df(x)} (Rockafellar 
(1970), Section 23). In general 


df(x | y) - sup {| z edf(x)} 
dd%(X) = {2 AijX}, 

To compute the subdifferential ddij(X) we use the representation 

dij(X) = max (e* — ej)' Xu. 

u'u —1 


From this we have for d l3 (X) > 0 

dd,A x) = {j-^XjX}, 

while for dij(X) = 0 

ddij(X) = x{(ej — ej)u \ u'u = 1}, 
where >«(•) is the closed convex hull. 

A stationary point of a locally Lipschitz function / is a point x where 0 G df(x). 

12 SMACOF 
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